Prospects for building large timetrees using molecular data with incomplete gene coverage among species.

نویسندگان

  • Alan Filipski
  • Oscar Murillo
  • Anna Freydenzon
  • Koichiro Tamura
  • Sudhir Kumar
چکیده

Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness of the species-gene matrix on the accuracy of divergence time estimates. Here, we present results from computer simulations and empirical data analyses to quantify the impact of missing gene data on divergence time estimation in large phylogenies. We found that estimates of divergence times were robust even when sequences from a majority of genes for most of the species were absent. From the analysis of such extremely sparse data sets, we found that the most egregious errors occurred for nodes in the tree that had no common genes for any pair of species in the immediate descendant clades of the node in question. These problematic nodes can be easily detected prior to computational analyses based only on the input sequence alignment and the tree topology. We conclude that it is best to use larger alignments, because adding both genes and species to the alignment augments the number of genes available for estimating divergence events deep in the tree and improves their time estimates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A ricle Prospects for Building Large Timetrees Using Molecular Data with Incomplete Gene Coverage among Species

Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness o...

متن کامل

Fast and Accurate Estimates of Divergence Times from Big Data.

Ongoing advances in sequencing technology have led to an explosive expansion in the molecular data available for building increasingly larger and more comprehensive timetrees. However, Bayesian relaxed-clock approaches frequently used to infer these timetrees impose a large computational burden and discourage critical assessment of the robustness of inferred times to model assumptions, influenc...

متن کامل

Data Mining for Identification of Forkhead Box O (FOXO3a) in Different Organisms Using Nucleotide and Tandem Repeat Sequences

 Background: Deregulation of FOXO3a gene which belongs to Forkhead box O (FOXO) transcription factors, can cause cancer (e.g. breast cancer). FOXO factors have important role in ubiquitination, acetylation, de-acetylation, protein-protein interactions and phosphorylation. Understanding the regulation and mechanisms of FOXO3a can lead to cancer treatment. The aim of this study recent association...

متن کامل

Ribulose-1, 5-Bisphosphate Carboxylase/Oxygenase Gene Sequencing in Taxonomic Delineation of Padina Species in theNorthern Coast of the Persian Gulf, (IRAN)

Taxonomic study of the genus Padina (Dictyotales, Phaeophyceae) from the Persian Gulf coast was conducted based on morphology and molecular phylogenetic analyses using chloroplast encoded large subunit RuBisCo (rbcL) gene sequences. Detailed descriptions of each species found in this study are described. Several morphological characters, such as number of cell layers composing the thallus, pr...

متن کامل

Morphological and molecular characterization of three new Fusarium species associated with inflorescence of wild grasses for Iran

In order to explore biodiversity of Fusarium species associated with the inflorescences of poaceous weeds, heads and inflorescences were collected from wild grasses in Ardabil province (Iran). Fusarium species were isolated using general and selective media. Pure cultures were established using a single spore technique. The isolates were identified based on morphological and molecular data. Seq...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 31 9  شماره 

صفحات  -

تاریخ انتشار 2014